50 research outputs found

    IBM Power™ 8 experiments

    Get PDF
    International audienc

    Altimesh Hybridizer

    Get PDF
    International audienc

    How Pascal And Power 8 Will Accelerate Counterparty Risk Calculations

    Get PDF
    International audienc

    Using CLANG/LLVM Vectorization to Generate Mixed Precision Source Code

    Get PDF
    International audienc

    Hybrid Vector Library-From Memory Bound to Compute Bound with NVVM

    Get PDF
    International audienceExisting source code usually interleaves data management, error-checking, text processing and actual compute. On general purpose processors, this mixture of code tasks is not necessarily an issue, and performance levels are often satisfactory as is. However, when trying to use GPU, this hybrid computing turns into a coding challenge. Each individual computing tasks does not show sufficient workload, and porting the whole application requires a significant investment in the software asset. We propose an alternate approach with runtime compilation based on function calls on a compute library. Hybrid Vector Library operates on vectors, in a manner similar to BLAS level 1 routines, with other functions such as square root or exponential, or MKL routines. In essence, all operations are performed on a vector of values. We illustrate the performance results of this approach on a typical financial benchmark.Existing solutions such as ArrayFire do not allow custom device function to be called in the middle of a level 1 routines sequence. We address that issue by also processing these functions. We follow the call graph from the main compute routine, and generate cubin files for user-defined device functions. These functions are then linked at runtime to the hvl calls sequence, and usually generate a JCAL instruction in SASS, in a similar way to sqrt.Our approach gives similar benefits to user's code as ArrayFire, with the flexibility of custom device functions

    Image Processing in Java Running on GPU

    Get PDF
    International audienc

    Altimesh Hybridizer™ Enabling Accelerators in .Net and more

    Get PDF
    International audienc

    Shadow Computations using Robust Epsilon Visibility

    Get PDF
    Analytic visibility algorithms, for example methods which compute a subdivided mesh to represent shadows, are notoriously unrobust and hard to use in practice. We present a new method based on a generalized definition of extremal stabbing lines, which are the extremities of shadow boundaries. We treat scenes containing multiple edges or vertices in degenerate configurations, (e.g., collinear or coplanar). We introduce a robust epsilon method to determine whether each generalized extremal stabbing line is blocked, or is touched by these scene elements, and thus added to the line's generators. We develop robust blocker predicates for polygons which are smaller than epsilon. For larger values, small shadow features merge and eventually disappear. We can thus robustly connect generalized extremal stabbing lines in degenerate scenes to form shadow boundaries. We show that our approach is consistent, and that shadow boundary connectivity is preserved when features merge. We have implemented our algorithm, and show that we can robustly compute analytic shadow boundaries to the precision of our chosen epsilon threshold for non-trivial models, containing numerous degeneracies

    Flexible Point-Based Rendering on Mobile Devices

    Get PDF
    Point-based rendering is a compact and efficient means of displayingcomplex geometry. For mobile devices which typically have limited CPU orfloating point speed, limited memory, no graphics hardware and a smalldisplay, a hierarchical packed point based representation of objectsis particularly well adapted. We introduce -grids, which are ageneralization of previous octree based representations and analyse theirmemory and rendering efficiency. By storing intermediate node attributes,our structure allows flexible rendering, permitting efficient local imagerefinement, required for example when zooming into very complex scenes.We also introduce a novel and efficient one-pass shadow mapping algorithm usingthis data structure. We show an implementation of our method on a PDA,which can render objects sampled by 1.3 million points at 2.1 frames per second;the model was originally made up of 4.7 million polygons
    corecore